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T—i Summary. One of the most interesting scientific challenges nowadays deals with 

the analysis and the understanding of complex networks' dynamics and how their 
I I ' processes lead to emergence according to the interactions among their components. 

In this paper we approach the definition of new methodologies for the visualiza- 
^ tion and the exploration of the dynamics at play in real dynamic social networks. We 

^ present a recently introduced formalism called TVG (for time- varying graphs), which 

^ was initially developed to model and analyze highly-dynamic and infrastructure-less 

communication networks. As an application context, we chose the case of scientific 
communities by analyzing a portion of the ArXiv repository (ten years of publica- 
tions in physics) . The analysis presented in the paper passes through different data 
transformations aimed at providing different perspectives on the scientific commu- 
nity and its evolutions. 

On a first level we discuss the dataset by means of both a static and temporal 
analysis of citations and co-authorships networks. Afterward, as we consider that 
scientific communities are at the same time communities of practice (through co- 
authorship) and that a citation represents a deliberative selection pointing out the 
relevance of a work in its scientific domain, we introduce a new transformation aimed 
at capturing the interdependencies between collaborations' patterns and citations' 
effects and how they make evolve a goal oriented systems as Science. 

Finally, we show how through the TVG formalism and derived indicators, it is 
^ possible to capture the interactions patterns behind the emergence (selection) of 

a sub-community among others, as a goal-driven preferential attachment toward a 
set of authors among which there are some key scientists (Nobel prizes) acting as 
attractors on the community. 
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1 Introduction 

One of the most interesting scientific challenges nowadays deals with the 
analysis and the understanding of social networks' dynamics and how their 
processes lead to emergence according to the interactions at play among 
their components. The research efforts in this area strive to understand 
what are the driving forces behind the evolution of social networks and 
how they are articulated together with social dynamics, e.g., opinion dy- 
namics, the epidemic or innovation diffusion, the teams formation and so 
on ([7, 22, 15, 3, 28, 9, 31, 5, 30]). In this paper we approach the definition 
of new methodologies for the visualization and the exploration of the dynam- 
ics within real dynamic social networks. As an example, we chose the case 
of scientific communities by analyzing a portion of the ArXiv repository (ten 
years of publications in physics) focusing on the social determinants (e.g. goals 
and potential interactions among individuals) behind the emergence and the 
resilience of scientific communities. In particular, the analysis addresses the 
co-existence of co-authorships' and citations' behaviors of scientists by focus- 
ing on the most proficient and cited authors interactions' patterns and, in 
turn, on how they are aff'ected by the selection process of citations. Such a 
social selection a) produces self-organization because it is played by a group of 
individuals which act, compete and collaborate in a common environment in 
order to advance Science and b) determines the success (emergence) of both 
topics and scientists working on them. 

On the one hand, the studies on scientific network dynamics deal with the 
understanding of the factors that play a significant role in their evolution, not 
all of them being neither objective nor rational e.g., the existence of a star 
system [38] , [24] , [25] , [1] the blind imitation concerning the citations [20] , the 
reputation and community affiliation bias [8]. On the other hand, having some 
elements to imderstand such dynamics could enable for a better detection of 
the hot topics and of the vivid subfields and how the scientific production is 
advanced with respect to selection process inside the community itself. Among 
the available data to analyze such a system, a subset of the publications in 
a given field is the most frequently used such as in [29], [23], [26], and [33]. 
The scientific publications correspond to the production of such a system and 
clearly identify who arc the producers (the authors), which institution they 
belong to (the aSiliation), which funded project they are working on (the ac- 
knowledgement) and what are the related publications (the citations), having 
most of the time a public access to these data explain also a part of its frequent 
use in the analyses of the scientific field. Classical analyses concern either the 
co-authorships network ([1, 24]) or the citation network ([11, 34]), more rarely 
the institutional network ([28]). Moreover, such networks are often considered 
as static and their structure is rarely analyzed overtime (an exception is the 
one performed by [33] on Physical Review). 

The illustrative analysis presented in the paper passes through different 
data transformations aimed at providing different perspectives on the scien- 



Emergence through Selection: The Evolution of a Scientific Challenge 3 

tific network and its evolutions. On a first level we discuss the dataset by 
means of both static and temporal analysis of citations and co-authorships 
networks. A second level of analysis consists in transforming the data in order 
to explicit the interdependencies between the co-authorships and citations by 
analyzing the scientists' representations of the collaboration structure within 
the scientific field. Such a representation is captured through the network of 
cited collaborations ([32]), i.e. from a publication we have several references to 
other papers, each one corresponds to a promotion of the scientists authoring 
the work. 

One of the problem when trying to characterize such a dynamic structure 
is that classical indicators from either graph theory or social network analysis 
cannot be applied directly. Therefore, we used an algebra, the Time- Varying 
Graphs (TVG) ([4]) that enables to take into account the dynamical aspects 
of networks and allows for the definition of temporal indicators ([35]) to char- 
acterize patterns in evolving structures. 

Through our approach, we capture the attractiveness played by famous 
authors on co-authorship behaviors and on the sub-communities structural 
evolution. 



2 Context 

In [24] the network of scientific collaborations, explored upon several databases, 
shows a clustered and small world structure. Moreover, several differences be- 
tween the collaborations' patterns of the different fields studied are captured. 
Such differences have been deepened in [25] with respect to the number of pa- 
pers produced by a given group of authors, the number of collaborations and 
the topological distances between scientists. Peltomaki and Alava in [27] pro- 
pose a new emulative model aimed at approximating the growth of scientific 
networks, by incorporating bipartition and sub- linear preferential attachment. 
A model for the self-assembly of creative teams based on three parameters (e.g. 
team size, the rate of newcomers in the scientific production and the tendency 
of authors to collaborate with the same group) has been outlined in [9] . Con- 
nectivity patterns in a citations network have been studied in relation to the 
development of the DNA theory [11]. The work of Klemm and Eguiluz ( [12]) 
observed that real networks (e.g. movie actors, co-authorship in science, and 
word synonyms) growing patterns are characterized by a clustering trend that 
reaches an asymptotic value larger than regular lattices of the same average 
connectivity. 

In the field of social network analysis several works have approached the 
problem of temporal metrics [10, 14, 13]. Actually, the focus is on the definition 
of instruments able to capture the intrinsic properties of complex systems' 
evolution, that is, characterizing the interdependencies and the co-existence 
between local behaviors (interactions) and their global effects (emergence) 
[6, 21, 40, 7, 30]. The research approach to characterize the evolution patterns 



4 Walter Quattrociocchi and Frederic Amblard 



of social networks, at the very beginning was mainly based upon simulations, 
while in the past few years, due to the large availability of real datasets, 
either the methodology of analysis and the object of research have changed 
([37, 19, 13, 5, 17]). In particular, in the latter paper Leskovec states as central 
problem, for the social networks in general and for the scientific communities 
networks analysis in particular, the definition of mathematical models able 
to deal and to reproduce all the properties of dynamical real world networks 
such as the shrinking diameter ([18]), or the "small world" effect [39]. Actually 
instruments and paradigms affording this challenge are mainly based upon 
stochastic definitions ([16]) or conceptualized as a sequence of snapshots of 
the network at different times [36]. 

3 Preliminciries 

3.1 Time- Varying Graphs 

The time-varying graph (TVG) formalism, recently introduced in [4], is a 
graph formalisms based on an interaction- centric point of view and offers 
concise and elegant formulation of temporal concepts and properties [35]. 

Let us consider a set of entities V (or nodes), a set of relations E among 
entities (edges), and an alphabet L labeling any property of a relation (label); 
that is, E CVxVxL. The set E enables multiple relations between any given 
pair of entities, as long as these relations have different properties, that is, for 
any ei = (.Ti,yi, Ai) G E,e2 = {x2,y2,h) e E, {xi = 0:2 A t/i = 2/2 A Ai = 
A2) ei = 62. 

Relationships between entities are assumed to occur over a time span T C 
T, namely the lifetime of the system. The temporal domain T is assumed to 
be N for discrete-time systems or M for continuous-time systems. The time- 
varying graph structure is denoted by the set Q = (V,E,T,p,(), where p : 
E X T ^ {0, 1}, called presence function, indicates whether a given edge is 
present at a given time, and : i? x T — > T, called latency function, indicates 
the time it takes to cross a given edge if starting at a given date. As in this 
paper the focus is on the temporal and structural analysis of a social network, 
we will deliberately omit the latency function and consider TVGs described 
asg = {V,E,T,p). 

TVGs as a sequence of footprints. 

Given a TVG G = {V,E,T,p), one can define the footprint of this graph 
from ti to t2 as the static graph Gl*!'*^) = (F, £'[*!'*=)) such that G E,e G 
]7][ti-t2) ^ [ti,t2), p(e,t) = 1. In other words, the footprint aggregates 
interactions over a given time window into static graphs. Let the lifetime 
T of the time-varying graph be partitioned in consecutive sub-intervals r = 
[to,ti), [ti.t2) ■ ■ ■ [ti-ti+i), . . .; where each [tk,tk+i) can be noted r^. We call 
sequence of footprints of G according to t the sequence SF{r) = G'^°,G'^^, 
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Expressing other Temporal Concepts 

A sequence of couples J = {(ei, ti), (62, ^2) • • • > (e^, t/c)}, such that {ei, 62, 6^} 
is a walk in G is a journey in Q if and only if Vi, 1 < i < fc, p(ei, tj) = 1 and 
ii+i > ^j- The departure{J) and arrival{J) are respectively the starting date 
ti and the last date tk of a journey 

Journeys can be thought of as paths over time from a source to a des- 
tination and therefore have both a topological and a tem,poral length. The 
topological length of J is the number \J'\ = fc of couples in J (i.e., the 
number of hops); its tem.poral length is its end-to-end duration: WJW = 
arrival{J) — departure{J) . 

4 Exploring the Dataset 

4.1 The Dataset 

As mentioned in the introduction, the scientific community analyzed in this 
work has been extracted from the hep-th (High Energy Physics Theory) por- 
tion of the arXiv website, an on-line repository available at http://arxiv.org/. 

The dataset is composed by a collection of papers and therefore their 
related citations over the period within January 1992 to May 2003. For each 
paper the set of authors, the dates of the on-line publications on arXiv.org, 
and the references are provided. There are 352 807 citations within the total 
amount of 29 555 papers written by 59 439 authors. The broadness of the 
time window covered allows us to explore the dataset in order to extract, 
capture and characterize the evolution of the interactions patterns within the 
community by means of different data transformations. 

4.2 The Networks Description 

From the dataset, wc can easily derive two graphs. The first, namely the 
co-authorships network, having authors as nodes and the undirected links 
standing for the relation of co-authoring a paper. The second, the citations 
network, where nodes are the papers and the links (directed) are the references 
among papers. More formally, the derived graphs can be defined as: 

• the co-authorship netv^rork as Ga ■ {Va,E) where nodes in 14 are the 
authors and links e G E connect nodes co- authoring a paper. 

• the citations network as Gc : (Vc, E) where the nodes in Vc 'ai'c the 
papers and each edge e Cz E corresponds to a reference to another paper. 

In Table 1 we provide measures about the citations and collaborations 
networks. 

The diameters - e.g., the longest shortest path between to pairs of nodes 
(respectively authors and papers) - of both networks have high values. The 
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Network Indicators 


Ga 


Gc 


Network Diameter 


26 


37 


Network Modularity 


0,706 


0.617 


Network Average Clustering Coefficient 


0.5006 


0.156 



Table 1. Co-authorships and Citations Graph Measures 



modularity, measuring how a network can be partitioned into modules or 
subparts, has high values on both graphs. Whereas the average clustering 
coefficient of the collaborations graph is higher than in the citations graph. 

The networks are composed by several connected islands with few inter- 
connections within them, and the co-authorships network is more clustered of 
the citations graph. 

5 Temporalizing the Dateiset 

In this section we are going to explicit the temporal aspects, (i.e. the structural 

evolution) of the citations and co-authorships networks. The transformation is 
performed through the time-varying graphs formalism defined in the previous 
section. 

Wc derive two time-varying graphs; the temporal co-authorships netiuork, 
with undirected edges and authors as nodes where a link stands for the re- 
lations of co-authoring a paper; and the temporal citations network having 
papers as nodes and the links (directed) representing the citations from a pa- 
per to another one. The temporal dimension of both networks is derived by 
the paper's submission date. The temporal co-authorship network has edges 
labeled with the date of submission, while the temporal citations network has 
the nodes labeled with the publication date of papers citing other papers. 

More formally, we can define 

• the temporal co-authorships network as a quadruplet G\ : {V, E, T, p) 

where the nodes invGV are the authors and links e G E connect a couple 
of scientists co-authoring a paper. The temporal domain T = [ta, tb) of the 
function p, is the lifetime of each node v that in this context is assumed 
as ta to be the submission date of the paper and if, = oo. 

• the temporal citations network as a quadruplet G* : {V, E, T, p) where 
the nodes in the set V are the papers and each edge e G E corresponds to a 
citation to another paper. As for the co-authorships network, the temporal 
dimension T = [ta,tb) of the presence function p of G* is defined within 
the submission date of papers and oo. 
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5.1 Citations Network Evolution 

In this section we show the evolution of the temporal citations network G* by 
using the sequence of footprints as defined in section 3.1. The values are com- 
puted by aggregating the interactions occurring at each sub-interval SF(t) 
having r fixed to one year. Figure 1(a) shows the evolution of the cluster- 
ing coefficient the curve is characterized by a stable trend attesting on low 
values. The density evolution, which is shown in Figure 1(b), presents the 
same low and decreasing behavior, meaning that both the distances and in- 
terconnections among nodes (citations within papers) are stable for all the 
time windows observed. Also the modularity, shown in Figure 1(c), has a 
decreasing but stable trend. 




5.2 Co-authorships Network Evolution 

The temporal co-authorships graphs presents a different structural evolution 
with respect to the temporal graph of citations analyzed in the previous sec- 
tion. As before, here the values are computed by aggregating the interactions 
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at each sub-interval SF{t) where t is fixed to one year. The average cluster- 
ing coefficient evolution in the time interval observed, that is shown in Figure 
2(a), has an oscillating trend with higher values than the ones reached by 
the temporal citations graph. In addition, G^j has a more modular and denser 
structure, as shown in Figure 2(c) and Figure 2(b). The captured trends are 
characterized by a decreasing (and not stable) trend. 



1.990 1 992 1 994 1.996 1 99B 2.000 2 D02 2 004 



1 990 1.992 1 994 1 995 1.990 2 000 2 002 2 004 




(a) Average Clustering Coefficient 



(b) Density 



1.990 1 992 1 994 1 995 1.99B 2.000 2.002 2 004 



(c) Modularity 



Fig. 2. Co-authorships Graph Evolution 



6 Expliciting Interactions 

In this section we provide an additional data transformation in order to cap- 
ture more details about the evolution of our scientific network. The dynam- 
ics in scientific communities are based upon competitions and collaborations 
among authors and groups of scientists. In the analysis we want to capture a) 
the resulting emerging effects caused by these two opposite motivations and b) 
how they are expressed in terms of collaborations and citations patterns. The 
dataset analyzed in the current paper presents two explicit interactions: the 
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papers' co-authorships and the citations between papers. In addition, there 
is an implicit level of interaction that depends upon the goals behind any re- 
search paper: the quality and, at the same time, the necessity to be highly cited 
(competition). Hence, often both the collaborations' and citations' strategies 
arc optimized in order to have the highest impact with respect to the problem 
addressed and to collect the highest number of citations. How such processes 
affect the scientific production and the scientific communities structural evo- 
lution? 

6.1 Deriving the Interactions Graph 

We approach the data transformation in order to explicit the co-authorships 
and cited collaboration patterns and their interdependencies. The dataset is 
transformed in an undirected graph, that we call the interaction network, 
having as nodes the authors, weighted links representing the co-authorship 
on a paper, and when a paper is cited by another work, the links' weights, 
connecting the authors of the referenced paper, are incremented. 

More formally, the graph of the cited co-authorships is defined as quadru- 
pled GcC : {V, E, T, p) on a discrete time. The nodes v G V are the authors, 
the set of edges E represents the collaborations on a paper's production. 
The nodes appear on the graph the first time a paper they wrote has been 
published, and the interaction L is weighted with a variable Wi, namely the 
strength value of a collaboration, that is incremented at each citation received 
by a paper produced by a given couple of nodes {u, v) . 

In the following section we analyze the behaviors and the interaction's 
strategies within the most cited authors' network, such a graph, namely 
Gj, is a subset of the global interaction network GgC : {V,E,T,p). 

In particular the nodes considered in the analysis are only the authors 
having links' strength values Wi > 150, that is, all the groups having more 
than 150 citations on a work. Such a network in its maximum expansion, 
during the 10 years temporal window observed, is composed by 12 583 nodes 
and 84 512 edges. 

6.2 The Phase Transition 

Figure 3(b) shows the density values for each element of the temporal sequence 
of footprints SF{t) of the interactions network of the most proficient scientists 
Gi. The time interval t is fixed to one year. 

The density trend starts with very low values and then an increase of the 
graph's sparsity occurs during its evolution with a very low counter-trend 
during the period between 1999 and 2000. 

The growing rate of the modularity, computed on SF(t) is shown in 3(c). 
It is characterized by an increasing rate until 1993, then it reaches its highest 
values during the period between 1999 and 2000, but through a smoothed 
rate. As far as we can see by the modularity evolution, the interconnections 
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(a) Average Clustering Coefficient 



(b) Density 



(c) Modularity 
Fig. 3. Collaborations Graph Evolution 



among separated groups of authors starts in 1993, then their interconnection 
continues, but with a more stable rate. Looking at the curve of the average 
clustering coefficient shown in (Figure 3(a)), we can see a phase transition 
occurring between 1999 and 2000 and separating a monotone trend from a 
decreasing one. 

We can interpret the modularity trend as showing that nodes during the 
first phase are divided in several and separated groups, while after the phase 
transition of the clustering coefficient, the connections among these groups 
start to become denser causing a network structure with a smaller number 
of larger communities (modules) - e.g. the network tends toward a structural 
homogeneity. 



6.3 Zooming on Interconnections 

In order characterize the phenomena behind the phase transition outlined 
in the previous section, in Figure 4 we show the trend of the average ratio 
between nodes and edges in both the whole interaction network (in black) 
and the network of the most proficient scientists (in red). As we can see, the 
phase transition, evinced in the clustering coefficient evolution in Figure 3(a), 
is not caused by an increase of the number of authors in the period between 
1999 and 2000, neither it is a pattern related to the whole dataset. 
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Fig. 4. The average number of edges per node for the entire network of the cited 
co-authorships (in black) and for the network of the most proficient authors (in red) 



In Table 6.3 we present the evolution of the average degree, the average 
path length and of the degree power law within the temporal window observed. 
As for the previous indicators these values are computed on the sequence of 
footprints SF{t) with t fixed to one year of Gi. 



Year 


Average Degree 


Average Path Length 


Power Law 


1992 


0,0095 


1 





1993 


0,0176 


1 


-1,386 


1994 


0,012 


1 


-1,79 


1995 


0,0135 


1,16 


-2,16 


1996 


0,132 


1,13 


-2,27 


1997 


0,0118 


1,12 


-2,5 


1998 


0,106 


1,12 


-2,5 


1999 


0,066 


3,92 


-5,08 


2000 


0,64 


3,79 


-5,27 


2001 


0,6 


3,82 


-5,25 



Table 2. Other interaction network's measurements 



In bold the values when the phase transition occurs. Neither the average 
path length, indicating the average distances among nodes, the power law 
degree, measuring how closely the degree distribution of a network follows 
a power-law scale and the evolution of average degree, counting the average 
number of connections at each node, are immune to the phase transition. 
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7 Goals and Preferential Attachment 

As the Time- Varying graphs is an interaction-centric formahsm. In this section 
we will show how such a modeling approach is compliant with one of the 
widely diffused platforms for network analysis and how it is possible to show 
the punctual evolution as in a movie of the temporal networks. 

The most interesting emerging phenomenon from the previous section is 
a phase transition in the evolution of the structure of the most proficient 
authors' network occurring between 1999 and 2000. According to the tem- 
poral analysis, such changes in the network are caused by a particular trend 
regarding the interconnections among nodes (authors) of the most cited co- 
authorships graph Gi. 

In this section we outline the network evolution by showing the formation 
of the biggest community at the beginning of the phase transition (1998) until 
its maximum expansion (2002). In Figure 5 we provide a sequence of screen- 
shots showing the nodes' aggregation patterns. The pictures are obtained 
through the Gephi platform ([2]). At the beginning (Figure 5(a)) there are 
several separated components, that start to connect (Figure 5(b)). 




(a) Several separated connected (b) that start to connect with each 
components other 



Fig. 5. Connections within the islands 

Notice that the edges are emphasized with respect to the strength value 
Wi counting the number of citations of each couple of nodes. The component 
(group of authors) in the center is highly cited and it is playing as an attractor 
on the neighboring nodes as it is shown in Figure 6(a) until the maximum level 
of connections in the group is reached, as shown in Figure 6(b). 

7.1 Zooming on the Attractors 

In this section we provide a more detailed vision on such a process of aggre- 
gation toward the attractors. Let starts by introducing Figure 7 showing the 
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(a) The number of connections (b) The maximum level of con- 
within authors continues to increase, nectivity is reached. 



Fig. 6. The growing phase of interaction among authors 

number of citations received at each semester by the most cited paper in our 
dataset. 




Trimesters 



Fig. 7. the citations trend of the most cited paper for each semester 

The citations rate has a strong increase after two semesters. The third 
semester coincides with the interval (1999-2000) of the phase transition cap- 
tured in the previous section. 

Hence, in order to understand the effect of this paper, in the following we 
will show a sequence of snapshots of the network structure in the neighbor of 
the authors of the most cited paper when it appears in our database. Notice 
that in the following pictures, the nodes' diameters are proportional to the 
total amount of citations received by their papers. 
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At the beginning there are only separated components as shown in Figure 
8(a). Then a large node appears (Figure 8(b)), and near appears a node with 
a smaller diameter but with a higher number of links. The biggest node is one 
of the authors of the most cited paper and as we can see, the node has a very 
low number of connections (collaborations) in that time interval. 




(a) Before (b) one of the authors (the biggest 

node) of the most cited paper and a 
smaller node with an higher degree 
appear 



Fig. 8. The appearence of one of the most cited authors 




(a) The group of authors of the (b) The portion of the graph be- 
most cited papers appears. The an- comes denser 
thors are the two big nodes and the 
smaller hub 

Fig. 9. Densification through the hub node 

In Figure 9(a) the fat node (a Nobel prize) and the hub node are connected, 
they publish a paper together with another node with a large diameter. Several 
islands start to link the clique formed and as we can see in Figure 9(b) the 
process of diffusion continues by means of new hubs. 

The sequence of snapshots shows the interactions patterns behind the for- 
mulation of the "String Theory" and of its consequent developments. 
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Authors in the same community start to migrate toward the island of 
the authors of the most cited paper. The increasing community densification 
causes the formation of a giant component around the group authoring the 
most cited paper that in turns makes the network become denser and homo- 
geneous as emerged in the analysis in the previous sections. 

It is a goal-driven preferential attachment e.g., the mechanism used to 
explain the power law degree distributions in social networks - due to the 
number of citations (representing the emergence through selection) to a given 
group. Authors tend to join highly cited groups to satisfy both the quality and 
the possibility to be highly cited requirements. Moreover, considering that at 
the beginning there arc several separated groups, the phenomenon can be in- 
terpreted as a three-fold process with a first phase as the exploration of ideas 
by means of separated works afforded by separated groups, a second one when 
a part of the ideas explored starts to be cited more than the others, and a 
third one when authors tend to join groups that have produced highly cited 
works. The process tripartition resembles the phases of the natural selection, 
e.g. the exploration, the selection and migration. In this context such a (social) 
selection a) produces self-organization because it is played by a group of in- 
dividuals which act, compete and collaborate in order to advance science and 
b) determines the success (emergence) of a topic and of the scientists working 
on it. 

7.2 Characterizing the Community Evolution 

Table 7.2 summarizes the network evolution by means of a) basic indicators, 
e.g. the number of nodes, the number of edges and the community's diameter) 
and b) aggregated indicators, e.g. the cyclomatic number, the alpha, beta, 
and gamma index. The cyclomatic number counts the number of cycles on the 
graph, its magnitude characterizes the development of the nodes' accessibility. 
The alpha index is the ratio between the number of cycles in the graph and 
their possible maximum value. The range of the alpha index spread within 
to 1, that are from no cycles to a completely intcirconnected network. The 
beta index, is a simple measure of connectivity. It relates the total number 
of edges to the total number of nodes. The higher the value, the greater the 
connectivity is. The gamma index measures the ratio between the number 
of edges on the network and the maximum number of possible edges among 
nodes. The gamma index spreads within and 100, respectively indicating 
the minimum and the maximum number of edges between nodes. 

As we can see from the evolution of these parameters, the aggregation 
pattern among separated components is evident for each one of the metric 
proposed. In terms of nodes that join the community and their mutual con- 
nections, the diameter over time passes through a phase of expansion and 
then tends to stabilize. 
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Measures 


April 00 


October 00 


April 01 


October 01 


April 02 


October 02 


April 03 


Vertices: 


23 


51 


65 


66 


67 


70 


72 


Edges: 


29 


75 


99 


100 


106 


110 


114 


Diameter: 


6 


10 


10 


10 


8 


8 


8 


Cyclomatic: 


17 


25 


35 


35 


40 


41 


43 


Alpha: 


0,73 


0,02 


0,017 


0,016 


0,018 


0,017 


0,017 


Beta: 


1,69 


1,47 


1,52 


1,51 


1,58 


1,57 


1,58 


Gamma: 


61,9 


51,02 


52,38 


52,08 


54,3 


53,92 


54,28 



Table 3. network measurement of the biggest community 



8 Conclusions 

In this paper we characterize the evolution of a scientific community extracted 
by the ArXiv's hep-th (High Energy Physics Theory) repository. The analysis 
starts with a static vision on the dataset by showing the structure of the cita- 
tions and co-authorships graphs dc;riv(xi by the dataset. Then by adding the 
temporal dimension on both networks we characterize the structural changes 
of the co-authorships and citations graphs. The temporal dimension and the 
metrics used for the analysis were formalized rising Time- Varying Graphs 
(TVG) , a mathematical framework designed to represent the interactions and 
their evolution in dynamically changing environments. 

Since wc are interested in the relationships between collaborations and 
citations behaviors of scientists, we focus on the network of most cited authors 
and on its structural evolution where several interesting aspects emerge. The 
network evolves toward a denser structure, a phase transition occurs in the 
1999-2000 time interval causing the homogenization of communities. 

Through our approach, we capture the role played by famous authors 
on co-authorship behaviors. They act as attractors on the community. The 
driving force is a sort of preferential attachment driven by the number of 
citations received by a given group, that in terms of the goal of any scientific 
community indicates a strategy oriented to the community belonging. 

Furthermore, the evolution of the network from a sparse and modular 
structure to a denser and homogeneous one, can be interpreted as a three-fold 
process reflecting the natural selection. The first phase is the exploration of 
ideas by means of separated works, once some ideas start to be cited (selected) 
more than others, then authors tend to join groups that have produced highly 
cited works. The selection is performed by individuals in a goal oriented envi- 
ronment and such a (social) selection produces self-organization because it is 
played by a group of individuals which act, compete and collaborate in order 
to advance Science. In fact, the driving force is an emergent effect of the inter- 
dependencies between citations and the goal of the scientific production since 
the social selection determines the emergence of a topic and of the scientists 
working on it by determining the so called preferential attachment toward 
groups and topics having high potential of citations. Finally, we show that 
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the migration of authors toward the most cited authors (attractors) expresses 
through a hub node, - e.g. a node with few citations and several co-authorships. 

In the next future we are going to outhne the behavior of the most profi- 
cient scientist in terms of their aggregation patterns, and on how their works 
are diffused within the community, that is, characterizing the reasons behind 
the selection process causing the network structural evolution. Such aspects 
will be addressed both with new analyses on different datasets and by means 
multi-agent simulations. The former stream will be devoted to the definition 
of new patterns, the latter will be used for the understanding of how changing 
some parameters of the network influences the evolution, and consequently 
the quality, of the scientific production. 
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